{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "### Lab 9 - Computing probabilities 2\n", "\n", "This lab will use the 311 service request dataset from NYC Open Data you downloaded in Lab 8.\n", "\n", "In this lab, we will look at more complicated filters, and probabilities involving *and* and *or*.\n", "\n", "As usual, we will import the matplotlib and pandas packages, and set plots to appear in the Jupyter notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Next, write code to load your 311 data into a dataframe called `calls`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Check that the dataframe was created properly by displaying it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose we want to know what percentage of complaint are about noise. The formula for this probability is: \n", "$\\text{proability a call is about noise} = \\frac{\\text{# of calls about noise}}{\\text{total # of calls}}$\n", "\n", "But there are different types of noise complaints. To get a list of all unique values in the `Complaint Type` column, type `calls[\"Complaint Type\"].unique()` below and run the code. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "How many different types of noise complaints are there?\n", "\n", "There are: \n", "- `Noise - Vehicle`\n", "- `Noise - Residential`\n", "- `Noise - Commercial`\n", "- `Noise` \n", "- `Noise - Street/Sidewalk`\n", "- `Noise - House of Worship`\n", "- `Collection Truck Noise`.\n", "\n", "We want to count all these noise complaints. To do so, we will make a filter for each one:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "noise_vehicle_filter = calls[\"Complaint Type\"] == \"Noise - Vehicle\"\n", "noise_res_filter = calls[\"Complaint Type\"] == \"Noise - Residential\"\n", "noise_commercial_filter = calls[\"Complaint Type\"] == \"Noise - Commercial\"\n", "noise_filter = calls[\"Complaint Type\"] == \"Noise\"\n", "noise_street_filter = calls[\"Complaint Type\"] == \"Noise - Street/Sidewalk\"\n", "noise_worship_filter = calls[\"Complaint Type\"] == \"Noise - House of Worship\"\n", "noise_truck_filter = calls[\"Complaint Type\"] == \"Collection Truck Noise\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then we combine the filters using `|` symbol, which means *or* and is used because we want to mark a row as `True` in `all_noise` if the first filter is true *or* the second filter is true *or* the third filter is true, etc. In other words, a row just has to satisfy one of the filters to be marked as `True` in `all_noise`.\n", "\n", "The symbol \\\\ is used to split the single line of code onto two lines for readability." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "all_noise = noise_vehicle_filter | noise_res_filter | noise_commercial_filter | \\\n", " noise_filter |noise_street_filter | noise_worship_filter | noise_truck_filter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To check `all_noise` is correct, let's select all rows from the dataframe for which have a `True` value in `all_noise`. Type `calls[all_noise]` below and run it." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Write code below to count how many values in `all_noise` are `True`. We did this in Lab 8." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " num_calls = calls.shape[0]\n", "
\n", "\n", "How many noise complaints of any type are there?\n", "\n", "Save the total number of noise complaints as a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now you can compute the probability that a call is about noise. Look back at Lab 8 if you need help computing the denominator or the probability itself." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "What percentage of calls are about noise? Are you surprised? \n", "\n", "What about if we wanted to find out the probabilty that a call is a residential noise complaint from the Bronx?\n", "\n", "The formula is:\n", "$\\text{probability a call is a residential noise compliant from the Bronx} = \\frac{\\text{# calls that are residential noise complaints and from the Bronx}}{\\text{total # of calls}}$\n", "\n", "First create two filters, one to check if a complaint is about residential noise and one to check if the `Borough` column has the value `BRONX` (since the boroughs are stored in all caps)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " noise_filter = calls[\"Complaint Type\"] == \"Noise - Residential\"\n", "borough_filter = calls[\"Borough\"] == \"BRONX\"\n", "
\n", "\n", "This time we want the rows to be `True` if both filters are `True`, so we need to use the symbol &, which means *and*, between the two filters. The code is below, but you may need to change the names of the filters if you called yours something else. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "bronx_and_res_noise = noise_filter & bronx_filter" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Count the number of `True` values in `bronx_and_res_noise` to count the number of complaints about residential noise in the Bronx:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " bronx_and_res_noise.sum()\n", "
\n", "\n", "As before, store the number of these complaints in the variable `num_bronx_and_res_noise`:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, can you compute the probability that a call is a residential noise complaint from the Bronx?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " num_bronx_and_res_noise/num_calls\n", "
\n", "\n", "What percentage of 311 calls are residential noise complaints from the Bronx?\n", "\n", "Note that in this case the probability is also the proportion of 311 calls about residential noise complaints in the Bronx. Here the probability and proportion are the same because we are estimating the probability from the data.\n", "\n", "Sometimes it's hard to interpret a single probability. Let's compare the probability that a call from the Bronx is about no heat/hot water with the probability that a call from Manhattan is about no heat/hot water.\n", "\n", "First write down the formula for computing the probability that a call from the Bronx is about no heat/hot water.\n", "\n", "
Answer:\n", "$\\text{probability a call from the Bronx is about no heat/hot water} = \\frac{\\text{# calls from the Bronx about no heat/hot water}}{\\text{# of calls from the Bronx}}$\n", "
\n", "\n", "Next, compute the number of calls from the Bronx and save it in a variable." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " bronx_filter = calls[\"Borough\"] == \"BRONX\" \n", "num_calls_bronx = bronx_filter.sum()\n", "
\n", "\n", "Next, compute the number of calls from the Bronx about no heat/hot water and save it in a variable. These calls will have `BRONX` in the `Borough` column and `HEAT/HOT WATER` in the `Complaint Type` column." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " heat_filter = calls[\"Complaint Type\"] == \"HEAT/HOT WATER\"\n", "heat_bronx_filter = heat_filter & bronx_filter\n", "num_calls_bronx_heat = heat_bronx_filter.sum()\n", "
\n", "\n", "Finally, to compute the probability that a call from the Bronx is about no heat/hot water." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " num_calls_bronx_heat/num_calls_bronx\n", "
\n", "\n", "Next we'll compute the probability that a call from Manhattan is about no heat/hot water. Try to do this below. If you need to add more code cells, you can do it by clicking Insert in the menu." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Answer:\n", " # Count the number of calls from Manhattan\n", "manhattan_filter = calls[\"Borough\"] == \"MANHATTAN\" \n", "num_calls_manhattan = manhattan_filter.sum()\n", "# Count the number of calls from Manhattan about no heat/hot water.\n", "heat_filter = calls[\"Complaint Type\"] == \"HEAT/HOT WATER\"\n", "heat_manhattan_filter = heat_filter & manhattan_filter\n", "num_calls_manhattan_heat = heat_manhattan_filter.sum()\n", "# Compute the probability that a call from Manhattan is about no heat/hot water.\n", "num_calls_manhattan_heat/num_calls_manhattan\n", "
\n", "\n", "How does the probability that a call from Manhattan is about no heat/hot water compare to the probability that a call from the Bronx is about no heat/hot water?\n", "\n", "#### Challenges:\n", "- What is the probability that a call is from Brooklyn or Queens?\n", "- What is the probability that the location type is `Street/Sidewalk` and the call is from Staten Island?\n", "- What is the probability that a call about no heat/hot water is from the Bronx? (Note this is different than the proability that a call from the Bronx is about no heat/hot water.)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }